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Abstract. We consider some questions about formal languages that 
arise when inverses of letters, words and languages are defined. The re- 
duced representation of a language over the free monoid is its unique 
equivalent representation in the free group. We show that the class of reg- 
ular languages is closed under taking the reduced representation, while 
the class of context-free languages is not. We also give an upper bound 
on the state complexity of the reduced representation of a regular lan- 
guage, and prove upper and lower bounds on the length of the shortest 
reducible string in a regular language. Finally we show that the set of 
all words which are equivalent to the words in a regular language can 
be nonregular, and that regular languages are not closed under taking a 
generalized form of the reduced representation. 

1 Introduction 

A word in a free group can be represented in many different ways. For example, 
aaa -1 and aa~ 1 baa~ 1 b~ 1 a are two different ways to write the word a. Among all 
the different representations, however, there is one containing no occurrences of 
a letter next to its own inverse. We call such a word reduced. In this paper 
we consider some basic questions about formal languages and their reduced 
representations. In previous work on automatic groups, these notions of inverse 
symbols and reduced words have been studied, but only with regards to automata 
that are assumed to generate groups pQ. 

First we define some standard notation. A deterministic finite automaton 
(DFA) is denoted by a quintuple (Q,S,S,qo,F) where Q is the finite set of 
states, S is the finite input alphabet, 8 : Q x £ — > Q is the transition function, 
qo G Q is the initial state, and F C Q is the set of accepting states. We generalize 
8 to the usual extended transition function with domain Q x £*. We use similar 
notations to represent a nondeterministic finite automaton with e-transitions (e- 
NFA), except the transition function is 8 : Q x (S U {e}) — > 2 Q . In an e-NFA it 



is possible to have e-transitions, which are transitions that can be taken without 
reading input symbols. For a DFA or e-NFA M, L(M) is the language accepted 
by M. For any x G £*, \x\ denotes the length of x, and \x\ a for some a G £ 
denotes the number of occurrences of a in x. We let \£\ denote the alphabet size. 
We use the terms prefix and factor in the following way. If there exist x, z G £* 
and w = xyz, we say that y is a factor of w. If x = e, we also say that y is a 
prefix of w. If y is a factor or prefix of w and y ^ w, then 1/ is a proper factor or 
a proper prefix of 10, respectively. 

In addition to this standard notation, we also define some notation specific 
to our problem. For a letter a, we denote its inverse by a -1 , and we let the 
empty word, e, be the identity. We consider only alphabets of the form £ = 
rur-\ where r = {1,2,. ..} and r^ 1 = {l" 1 , 2" 1 , . . .}. For a word w G £* = 
a\a<i ■ ■ ■ a n , we denote its inverse by w^ 1 — a~ x ■ ■ ■ a 2 " 1 a^ 1 , and for a language 
L C Z 1 *, we let Lr 1 = {w^ 1 : w G L}. Note that taking the inverse of a word is 
equivalent to reversing it and then applying a homomorphism that maps each 
letter to its inverse. Now, we introduce a reduction operation on words, consisting 
of removing factors of the form as -1 , with a G £. More formally, let us define 
the relation h C £* x £* such that, for all w, w' G £*, w h w' if and only if 
there exists x, y £ £* and a G £ satisfying w = xaa~ x y and w' = xy. As usual, 
I s denotes the reflexive and transitive closure of K 

Lemma 1. For eac/i u; G X 1 * t/iere exists exactly one word r(w) G 17* such that 
w ?- r(w) and r(w) does not contain any factor of the form aa _1 , with a G £ ■ 

Proof. First, we prove that if w h w' and w h w" then there exists u G i7* such 
that id' f m and u/' P u, i.e., in the terminology of rewriting systems, £* with 
the relation h is a local confluent system. 

To this aim, suppose that w = x'aa~ 1 y' = x"bb~ 1 y", w' = x'y', w" — x"y", 
for some x' , x" , y', y" G £*, a, b G £*, and, without loss of generality, that 
\x'\ < \x"\. If \x'\ — \x"\ then w' — w" , hence, we can take u — w' . Otherwise, if 
\x'aa~ 1 \ < \x"\ then w — x'aa~ 1 zbb~ 1 y" , for some z G £* and the desired word 
is u — x 1 zy" . In the only remaining case, x'a — x" , which implies that b = a -1 
and w — x'aa~ 1 ay" . Hence w' — w" — x 1 ay" , so we just take u = w' . 

Since w h w' implies that \w'\ — \w\ — 2, no infinite reduction sequences 
are possible. By Newman's lemma [5], this implies that the system is confluent, 
namely, for each w G £* there exists exactly one word r(w) such that w h r(w) 
and, for each x G £* , r(w) h x implies cc = r(w), i.e., r(w) does not contain any 
factor aa^ 1 . □ 

We define the reduced representation of a word w G 17* as the word r(w) given 
in Lemma [1] i.e, the word which is obtained from w by repeatedly replacing with 
e all factors of the form aa^ 1 , for any letter a G £, until no such factor exists. 
If r(w) — e we say that w is reducible. We can extend this to languages so that 
for L C £*, we let r(L) = {r(w) : w G L}. 

Given a language L, the h-closure of L is the set of the words which can be 
obtained by "reducing" words of L, i.e., the set {x G £* : 3w G L s.t. w I s cc}. 
Notice that r(i) coincides with the intersection of the h-closure of L and r(£*). 
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Section 2 examines the reduced representations of regular and context-free 
languages. Section 3 provides some bounds on the state complexity of reduced 
representations. In Section 4 we look at bounds on the length of the shortest word 
in a regular language that reduces to e. Section 5 demonstrates counterexamples 
for some other natural questions. 

2 Closure of Reduced Representation 

When considering the reduced representations of languages, it is natural to won- 
der if common classes of languages are closed under this operation. In this section 
we show that if a language L is regular then r{L) is regular, but if L is context- 
free then r(L) does not need to be context-free. 

Lemma 2. For £ = ruT" 1 , where r = {1, . . . , k} and T" 1 = {l" 1 , . . . , fc" 1 }, 
there exists a DFA M/~ of 2k + 2 states that accepts r(£*). 

Proof. We notice that a word w is reduced if and only if it does not contain 
the factor aa" 1 , for each a G £. This condition can be verified by defining an 
automaton that remembers in its finite control the last input letter. To this 
aim, the automaton has a state q a for each a £ £. If in the state q a the symbol 
a -1 is received, then the automaton reaches a dead state q~\. 

Formally, Mk is the DFA (Q, £,6,qo,F) defined as follows (see Figure [T] for 
an example): Q = {go, <7-i} U {qt : i G r U -T -1 }, F = Q\ {q~i}, and 



Lemma 3. Given an e-NFA M — (Q, S, <5, q$, F) with n states, an automaton 
M' accepting the \--closure of L(M) can be built in time 0(n 4 ). 

Proof. The idea behind the proof is to present an algorithm that given M com- 
putes M' by adding to M e-transitions corresponding to paths on reducible 
words. The algorithm is similar to a well known algorithm for minimizing DFAs 
[3 p. 70]. It uses a directed graph G — (Q,E) to remember e-transitions. For 
each pair of states s, t, the algorithm also keep a set l(s, t) of pairs of states, with 
the following meaning: if (p, q) e l(s, t) and the algorithm discovers that there is 
a path from s to t on a reducible word (and hence it adds the edge (s, t) to G), 
then there exists a path from p to q on a reducible word (thus, the algorithm 
can also add the edge (p,q)). 

E «— transitive closure of {(p,q) | g£ S(p, e)} 
for s,t e Q do l(s,t) «- 
for p,q,s,t 6 Q do 

if 3a G £ s.t. s G 8(p, a) and q G 5(t, a -1 ) then 
if (s, t) G E then E <— update(E, (p, q)) 




if a ^ cr 1 
otherwise. 




□ 



else l(s,t)<-l(s,t)\J{(p,q)} 
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Fig. 1. M 2 , a DFA that accepts r(S*) for |27| = 4. 



The subroutine update returns the smallest set E' having the following proper- 
ties: 

— EU{(p,q)}CE'; 

— if (p' , q') £ E' then each element belonging to l(p', q') is in E'; 

— the graph (Q, E 1 ) is transitive. 

At the end of the execution, the automaton M' is obtained by adding to M 
an e-transition from a state p to a state q for each edge (p, q) in the resulting 
graph G. 

Now, we show that the language accepted by M 1 is the h-closure of L(M). 
To this aim, we observe that for all p,q,s,t 6 Q, such that s € 6(p,a) and 
q G S(t, a -1 ) are transitions of M for some a £ S, if M' contains an e-transition 
from ,s to t then it must contain also an e-transition fromp to q. In fact, when the 
algorithm examines these 4 states in the loop, if (s, t) is in E then the algorithm 
calls update to add (p, q) to E. Otherwise, the algorithm adds (p, q) to l(s,t). 
Since M' finally contains the e-transition from s to t, then there is a step of 
the algorithm, after the insertion of (p, q) in l(s,t), adding the pair (s,i) to E. 
The only part of the algorithm able to perform this operation is the subroutine 
update]}} But when the subroutine adds the pair (s,t) to E then it must add all 
the pairs in l(s,t). Hence, M' must also contain an e-transition from p to q. As 

1 Notice that the pair (s, t) can be added to E by the subroutine update either because 
it is the second argument in the call of the subroutine, or because it belongs to a 
list l(p' , q'), where (p', q') is added to E in the same call, or because there is a path 
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a consequence, if w S L(M') and w h «/ then u>' € L(M'), i.e., L(M') is closed 
under K Since the algorithm does not remove the original transitions from M, 
L(M) C L(M') and, hence, the I— closure of L(M) is included in L(M'). 

On the other hand, it can be easily shown that for each e-transition of M' 
from a state p to a state q there exists a reducible word z such that q G <5(p, z) 
in M. Using this argument, from each word w E L{M') we can find a word 
x E L(M) such that x P w. This permit us to conclude that L(M') accepts the 
Kclosure of L{M). 

Now we show that the algorithm works in 0(n 4 ) time. A naive analysis gives 
a running time growing at least as n 4 . In fact, the second for-loop iterates over 
all state 4-tuples. Inside the loop the most expensive step is the subroutine 
update. This subroutine starts by adding an edge (p, q) to E. For each new 
edge (p',q') added to E the subroutine has to add all the edges in l(p',q'), 
while keeping the graph transitive. This seems to be an expensive part of the 
computation. However, we can observe that each set l{p' , q') contains less than n 2 
elements. Furthermore, a set l(p' , q') is examined only once during the execution 
of the algorithm, namely when (p', q') is added to E. Hence, the total time spent 
while examining the sets / in all the calls of the subroutine update is 0(n 4 ). 
Furthermore, no more that n 2 edges can be inserted into G, and each insertion 
can be done in 0{n) amortized time while maintaining the transitive closure 
|4l6j . Summing up, we get that the overall time of the algorithm is 0(n 4 ). □ 

By combining the results in the previous lemmata, we are now able to show 
the following: 

Proposition 1. Given an e-NFA M = (Q, S,6,qo, F) with n states, an e-NFA 
M r such that L(M r ) = r(L(M)) can be built in 0(n 4 ) time. 

Proof. The language r(L(M)) is the intersection of the h-closure of L(M) and 
r{S*). According to Lemma [U from M we build an automaton M' accepting 
the h-closure of L(M). Hence, using standard constructions, from M' and the 
automaton obtained in Lemma [2] (whose size is fixed, if the input alphabet is 
fixed), we get the automaton M r accepting r(M) = L{M') fl r(E*). 

The most expensive part is the construction of M' , which uses 0(n 4 ) time. 

□ 

Corollary 1. For any L C E* , if L is regular then r{L) is regular. 

Now we turn our attention to context-free languages and prove that the 
analogue of Corollary [1] does not hold. For this we use the notion of a quotient 
of two languages. 

Definition 1. Given L\,L2 C E* , the quotient of L\ by is 

E\jL<x = {w : 3x E L2 such that wx £ Li} 

from s to t consisting of some arcs already in E and at least one arc added during 
the same call of update. 
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While the class of regular languages is closed under quotients, the class of 
context-free languages is not closed under this operation [2], It turns out that 
the reduced representation of a language can be used to compute quotients. 

Lemma 4. For any two languages L\, L 2 C F* , the language L% = r^LiL^ 1 ) D 
r* equals the quotient L1/L2. 

Proof. We notice that r(wxx^ 1 ) = w, for each w, x e r*. Hence, given w 6 r*, 
it holds that w <E L3 = r{LxL^ ) n r* if and only if there exists x G L 2 such 
that wx 6 L\. Therefore L3 = L1/L2. □ 

Corollary 2. TVie c/ass of context-free languages is not closed under r(). 

Proof. By contradiction, suppose that the class of context-free languages is 
closed unded r(). Since this class is closed under the operations of reversal, 
morphism, concatenation and intersection with a regular language, for any two 
context-free languages L\ and L 2 over r, the language L s — r^LiL^ 1 ) D is 
also context-free. However, from Lemma 21 L3 = L1/L2, implying that the class 
of context-free languages would be closed under quotient, a contradiction. □ 

3 State Complexity of Reduced Representation 

Here we look at some bounds on the state complexity of the reduced represen- 
tation of a regular language. 

Proposition 2. For any e-NFA, M = (Q, P U r -1 , 6, q 0) F) with n states, r = 
{1, 2, . . . , k} and r^ 1 = 2 _1 , . . . , fc -1 } for some positive integer k, there 

exists a DFA of at most 2 n (2k + 2) states that accepts r(L(M)). 

Proof. The upper bound follows from the algorithm used to prove Proposition[TJ 
The first part of the construction (i.e., the construction of the automaton M 1 
accepting the Kclosure of the language accepted by M) does not increase the 
number of states. The resulting automaton M' can be converted into a DFA 
with 2" states. Finally, to get an automaton accepting r(L(M)) we apply the 
usual cross-product construction to this automaton and to the DFA with 2k + 2 
states accepting r(S*) obtained in LemmaH The intersection results in a DFA 
of no more than 2"(2fc + 2) states. □ 

Since each DFA is a fortiori an e-NFA, the previous proposition gives an 
upper bound for the state complexity of the reduced representation. 

4 Length of Shortest String Reducing to the Empty- 
Word 

Another interesting question is: given a DFA M of n states such that e G 
r(L(M)), what is the shortest w £ L(M) such that r(w) = e? We provide 
upper and lower bounds, and we examine the special case of small alphabets. 
First we provide an upper bound. 
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Proposition 3. For any NFA M = (Q, S, S, qo, F) with n states such that there 
exists w G L(M) with r(w) = e, there exists w' G L(M) such that \w'\ < 2" ~ n 
and r(w') = e. 



Proof. Suppose M accepts w G S + such that r(w) = e. Then w can be de- 
composed in at least one of two ways. Either there exist u, v G S + such that 
w = uv,r(u) — e and r(v) — e (Case 1), or there exist u G S*,a G £ such 
that w = aua^ 1 and r(u) — e (Case 2). Any factor w' of w such that r(w') = e 
can also be decomposed in at least one of these two ways, so we can recursively 
decompose w and the resulting factors until we have decomposed w into single 
symbols. So, we can specify a certain type of parse tree such that M accepts 
w G S* with r(w) = e if and only if we can build this type of parse tree for w. 

Define our parse tree for a given w as follows. Every internal node corresponds 
to a factor w' of w such that r(w') — e, and the root of the whole tree corresponds 
to w. The leaves store individual symbols. When read from left to right, the 
symbols in the leaves of any subtree form the word that corresponds to the root 
of the subtree. Each internal node is of one of two types: 



1. The node has two children, both of which are internal nodes that serve as 
roots of subtrees (corresponds to Case 1). 

2. The node has three children, where the left and the right children are single 
symbols that are inverses of each other, and the child in the middle is empty 
or it is an internal node that is the root of another subtree (corresponds to 
Case 2). 



An example is shown in Figure [2] Now, we fix an accepting computation of 
M on input w. We label each internal node t with a pair of states p,q G Q such 
that if w' is the factor of w that corresponds to the subtree rooted at t, and 
w = xw'y, then p G S(qo,x) and q G 5(p, w') are the states reached after reading 
the input prefixes x and xw', respectively, during the accepting computation 
under consideration. (This also implies that qj G S(q,y), with qj G F, and 
(go, Qf) is the label associated with the root of the tree.) 
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-1 
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-i 



b b- 1 



b- 1 



Fig. 2. An example parse tree for the word w = a 1 bb 1 aa 1 b 1 b l bba, without 
the state pair labels. 

If the parse tree of w has two nodes t and u with the same state-pair label 
such that u is a descendent of t, then there exists a word shorter than w which 
is accepted and reduces to the empty word. This is because we can replace the 
subtree rooted at t with the subtree rooted at u. Furthermore, if an internal node 
t is labeled with a pair (g, q), for some q £ Q, then the factor w' corresponding 
to the subtree rooted at t can be removed from w, obtaining a shorter reducible 
word. Hence, by a pigeonhole argument, we conclude that the height of the 
subtree corresponding to the shortest reducible word w is at most n 2 —n. We now 
observe that the number of leaves of a parse tree of height k defined according 
to our rules is at most 2 k . (Such a tree is given by the complete binary tree of 
height k, which has no nodes with three children. The avoidance of nodes with 
three children is important because such nodes fail to maximize the number 
of internal nodes in the tree, which in turn results in less than the maximum 
number of leaves.) This permits us to conclude that \w\ < 2™ ~™. □ 

Now we show that there is a lower bound that is exponential in the alphabet 
size and in the number of the states. 

Proposition 4. For all integers n > 3 there exists a DFA, M n , with n + 1 
states over the alphabet U = T U r^ 1 , where r = {1, 2, ... ,n — 2} and r^ 1 = 
2 _1 , . . . , (n — 2)~ 1 }, with the property that if w £ L(M n ) and r(w) = e, 
then \w\ > 2"" 1 . 

Proof. The proof is constructive. Let M n be the DFA (Q, S, S, qo, F), illustrated 
in Figured where Q = {q~i,qo,qi,- ■ .,q n -l}, F = {qi}, and 




if c = 1 and a = 0; 

if c = a -1 and 1 < a < n — 2; 

if either c — a and 1 < a < n — 2, 



or c = 1 1 and a = n — 1. 
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Any other transitions lead to the dead state q-\. 




Fig. 3. M n : ann + 1 state DFA with the property that for all w e L(M n ) such 
that r(w) — e, \w\ > 2™~ 1 . The dead state is not shown. 

Now we show that M n has the desired property. Assume there exists w G 
L(M n ) such that r(w) = e. Then \w\ a = \w\ a -i for all a G S. Since all words in 
L(M n ) must contain the symbol 1 (due to the single incoming transition to the 
only accepting state), it follows that w must also contain Furthermore, the 
only possible transition from q\ not leading to the dead state uses the symbol 
Hence w must begin with the prefix II -1 . Since 5(qo,ll^ 1 ) = q2 and the 
only two transitions that leave qi are on 2 and 2 _1 , w must contain both of 2 
and 2 . Now assume that w contains the symbol i~ l with 1 < i < n — 2. Then 
the state qi + \ must be reached while reading w, thus implying that the symbols 
(i + 1) and (i + also appear in w. Therefore, by induction, w must contain 
at least one occurrence of each a e S. 

Now we claim that w must contain at least 2™~ 2 ~ a occurrences of the symbol 
a, for 1 < a < n — 2, and hence at least 2™~ 2 ~ a occurrences of the symbol a -1 . 

We prove the claim by induction on n— 2— a. The basis, a = n— 2, follows from 
the previous argument. Now, suppose the claim true for k = n — 2 — a. We prove 
it for k + 1 = n — 2 — (a — 1). By the induction hypothesis, w contains at least 
2 fe occurrences of the symbol a and at least 2 k occurences of a -1 . Observing 
the structure of the automaton, we conclude that to have such a number of 
occurrences of the two letters a and a -1 , the state qu must be visited at least 
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2 k+1 times. On the other hand, the only transition entering q^ is from the state 
Qk-i on the letter (a — Hence, w must contain at least 2 k+1 = 2™~ 2 ~( a_1 ) 

occurrences of (a — and, according to the initial discussion, at least 2 k+1 = 
2n-2-(a-i) occurrences of a — 1. This proves the claim. 

By computing the sum over all alphabet symbols, we get that \w\ > 2(2™~ 2 — 

1) . However, since the symbol (n — 2) _1 must always be followed by the symbol 
1 , w must actually contain one additional occurrence of each of 1 and 1 . 
Thus \w\ > 2"- 1 . □ 

It turns out that the shortest word that reduces to e accepted by the DFA 
M n in the proof of Proposition [4] is related to the well-known ruler sequence, 
(v2(n))n>i, where i>2 (n) denotes the exponent of the highest power of 2 divid- 
ing n. This sequence has many interesting characterizations including being the 
lexicographically least infinite squarefree word over Z. For integers k > 0, let 
rk = {v2(n)i< n <2 k ) be the prefix of length 2 k — 1 of the ruler sequence. For 
example, r 3 = 0102010. 

Let w n be the shortest word accepted by M n that reduces to e. Let w' n be the 
prefix of w of length \w\ — 2. This is well defined for n > 3. Also let w n = w' n = e 
for n = 2. Then for any integer n > 3, we have w n — w' n _ 1 (n — 2)w' n _ 1 {n — 

2) _1 1 _1 1. It can be easily verified that this word is accepted by M n . Now define 
the homomorphism h such that h(a) = a for aef, and h(a) = e for a G r^ 1 . 
Then h(w n ) = r„_ 2 l. 

The next proposition shows that over a fixed alphabet size we can still get 
an exponential lower bound. 

Proposition 5. For each integer n > 1 there exists a DFA, M n with 3n + 1 
states over the alphabet £ = T U r^ 1 , where T = {1, 2} and T" 1 = {1 _1 , 2" 1 }, 
with the property that the only word w G L(M n ) such that r(w) = e has length 
\w\ =3- 2™ -4. 

Proof. The proof is constructive. Let M n be the DFA (Q, £,5,q n , F) where 
Q = {q-i,qo,qi,Pi} U {pi,qi,ri : 2 < i < n}, F = {p n }, and 

8(q a , c) = q a -i, if 1 < a < n, and either c = 1 and a = 1 (mod 2), 

or c = 2 and a = (mod 2). 

1 and a = (mod 2), 

2 and a = 1 (mod 2), 
l" 1 and a = (mod 2), 
2- 1 and a e 1 (mod 2). 



Pa+i, if 1 < a < n — 1, and either c ; 

or c : 

r a +i, if 1 < a < n — 1, and either c : 

or C : 



S(r a , c) = q a -i, if 2 < a < n, and either c = 1 1 and a = 1 (mod 2), 

or c = 2 _1 and a = (mod 2). 

5(g ,a _1 ) = pi 
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Any other transitions lead to the dead state The case n = 4 is illustrated 
in Figure |U 




Fig. 4. M4: a 3 • 4 + 1 state DFA with the property that the only reducible word 
accepted by it has length 3 • 2 4 — 4. (The dead state is not represented.) 



In order to prove the statement, for each integer m > 0, let us consider the 
set C m of pairs of states, others than the dead state, which are connected by a 
reducible word of length to, i.e. 

C m = {(s',s") e Q' x Q' : 3w G S* s.t. \w\ = m,r(w) = e, and 5(s',w) = s"}, 

where Q 1 — Q \ {^-i}- We notice that C m = 0, for m odd. Furthermore Co = 
{(s, s):s£ Q'} and, for m > 0, C m = C' m U C' n , where: 

C , m = {(s',s"):3(r',r")eC m ^,aeS 

s.t. 5(s', a) = r' and S(r", a^ 1 ) = s"}, (1) 

C = {(s',s") : 3m', m" > 0, (s',r') € C m >, (r",s") 6 C m » 

s.t. m +m" = m and r' = r"}. (2) 

We claim that, for each m > 1: 

f {(Qk,Pk)}, if 3k, 1 < fc < n, s.t. to = 3 • 2 k - 4; 

C m = I {{qk,r k ),{r k ,Pk)}, if 3fc, 2 < k < n, s.t. to = 3 • 2^ - 2; (3) 
[ 0, otherwise. 
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We prove ([3]) by induction on m. 

As already noticed, C\ = 0. By inspecting the transition function of M n , we 
can observe that C 2 — {(qi,pi)}. Notice that 2 = 3 • 2 1 — 4. This proves the 
basis. 

For the inductive step, we now consider m > 2 and we suppose ([3|) true for 
integers less than m. 

First, we show that we can simplify the formula @ for C T " . In fact, using 
the inductive hypothesis, for < m',m" < m, the only possible (s',r') E C m > 
and (r",s") E C m « satisfying r' = r" are the pairs (qj,rj), (rj,pj) E C 3 . 2 j-i_ 2 , 
obtained by taking m! = m" = 3-2-? -1 —2, for suitable values of j. This, together 
with the condition m! + m" = m, restricts the set to: 

C m = {(*', s") \3reQ: («', r) E C m/2 and (r, s") E C m/2 }. (4) 

Now we consider three subcases: 

Case 1: m = 3 ■ 2 k - 4, with fc > 2. 
An easy verification shows that m — 2 cannot be expressed in the form 3 ■ 2 3 — 4 
or in the form 3 ■ 2- J — 2, for any j. Hence, by the inductive hypothesis, C m _2 = 0. 
By ([1]), this implies = 0, and then C m = C„. 

We now compute using ([4]) and the set C m / 2 obtained according to the 
inductive hypothesis. We observe that m/2 = 3 • 2 fc_1 — 2. Hence, for k < n, 
Cm/2 = {{<lk,r k ), (r k ,Pk)} and, thus, C m = C" n = {(qk,Pk)}- On the other hand, 
if fc > n then C m / 2 = 0, which implies C m = C^ = 0. 

Case 2: m = 3 • 2 k ~ 1 - 2, with fc > 2. 
First, we observe that m/2 cannot be written either as 3 ■ 2 3 — 4 or as 3 • 2 J — 2. 
Hence, by the inductive hypothesis, the set C^ must be empty. Thus, C m = C' m . 

We compute C' m as in (P), using the set C m - 2 obtained according to the 
induction hypothesis. We notice that m — 2 = 3 • 2 fc_1 — 4. If fc > n + 1 then 
Cm-2 = 0, thus implying C m = C'„ = 0. Otherwise, C m _ 2 = {(ft-i, Pfe-i)}- 
In order to obtain all the elements of C' m , we have to examine the transitions 
entering qu-i or leaving pu-i- For fc = n + 1 there are no such transitions 
and, hence, C m = C' m = 0. For fc < n all the transitions entering qu-i or 
leaving pk-i involve the same symbol a E {1, 2} or its inverse: there are exactly 
two transitions entering qk-i (S(qk,a) = 5(rfc,a _1 ) = qk-i) and exactly two 
transitions leaving pk~i (S{pk-i,a) = Pk and 8{pk-i 1 a~ 1 ) — r^). Hence, by the 
appropriate combinations of these transitions with the only pair (qk-\,Pk-i) in 
Cm-2, we get that C m = C' m = {(q k , r k ), (r k ,Pk)}- 

Case 3: Remaining values of m. 
If m = 3 • 2 fe_1 with fc > 2, then C m - 2 = {{qk, ffc), ( r fciPfe)}- All the transitions 
entering or leaving use the same symbol a -1 , with a E {1,2}, while all the 
transitions entering qk or leaving pk use the other symbol b € {1, 2}, b ^ a, or 
Hence, from (HJ, = 0. 

For all the other values of m, the form of m — 2 is neither 3 • 2 3 — 4 nor 
3 ■ 2 J ' — 2. This implies that C m -2 = and, then, C' m must be empty. Hence, we 
conclude that C m = C" . 
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Suppose C'Jn y£ 0. From (H|) and the inductive hypothesis, it follows that 
m/2 = 3-2- 7 ' -1 — 2 for some j, thus implying to = 3-2 J — 4. This is a contradiction, 
because the values of to we are considering are not of this form. Hence, C m = C'^ 
must be empty. 

This completes the proof of ([3]). 

Recall that the initial state of M n is q n , while the only final state is p n . 
Hence, the length of the shortest reducible word accepted by M n is the smallest 
integer to such that (q n ,Pn) G C m . According to we conclude that such a 
length is 3 • 2" - 4. 

From ([3]), it also follows that there are no reducible words accepted by M n 
whose length is different than 3 • 2" — 4. With some small refinements in the 
argument used to prove ([3]), we can show that M n accepts exactly one reducible 
word. In particular, for k > 1 we consider: 

( ll- 1 , if k = 1; 

Wk = < lwk-il~ 1 l~ 1 Wk-il, if k > 1 and k odd; 
I 2uifc_i2~ 1 2~ 1 w / t_i2, otherwise. 

By an inductive argument it can be proved that Wk is the only reducible word 
such that S(qk, Wk) = Pk and \wk \ — 3 • 2 fe — 4. □ 

We now examine the special case where S = {1, l -1 }, and give a cubic upper 
bound and quadratic lower bound. The next proposition gives the upper bound, 
which holds even in the nondeterministic case. 

Proposition 6. Let M = (Q, {1, l -1 }, 6, qo, F) be an NFA with n states such 
that e G r(L(M)). Then M accepts a reducible word of length at most n(2n 2 + l). 

Proof. We prove the result by contradiction. Assume the shortest w £ L(M) 
such that r(w) — e has \w\ > n(2n 2 + 1). Define a function b on words by 
b(z) = \z\i — \z\i-i for z G S* . Roughly speaking, the function b measures the 
"balance" between the number of occurrences of the symbol 1 and those of the 
symbol 1 _1 in a word. 

Suppose that no factor w' of w has \b(w')\ > n 2 . Then the function b can 
take on at most 2n 2 + 1 distinct values. Since \w\ > n(2n 2 + 1), there must be 
a value C such that b takes the value C for more than n different prefixes of 
w. That is, there is some i > n such that w — xy%yi ■ ■ ■ yiz where the yt are 
nonempty and 

b(x) = b(xyx) = b(xyiy 2 ) = ■•■ = b^xyxyi ■ ■ ■ y t ). 

Consider a sequence of t + 1 states po,Pi, ■ . ■ ,p# G Q, and a state qj G F, such 
that during an accepting computation M makes the following transitions: 

Po G 5(q Q ,x),p 1 g 5(po,yi),...,pt G 6(pt-i,ye),qf G 5(pt,z) 

Since £ > n, a state must repeat in the above sequence, say pi — pj with 
i < j. Then u = xy± ■ ■ ■ yiyj+i ■ ■ ■ ytz is shorter than w (since we have omitted 
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Ui+i ■ ■ ■ yj) and it is accepted by M. Furthermore, observing that 6(yi+i • • • yj) = 
0, we conclude that r(u) = e. Since |u| < \w\, this is a contradiction to our choice 
of w. Hence, w must contain a factor w' of w such that |fo(w')l > n 2 . 

Let y be the shortest factor of w such that b(y) — and \b(y')\ > n 2 for 
some prefix y' of y. We can write w = xyz, for suitable words x, z. Let D be the 
maximum value of \b(y')\ over all prefixes y' of y. We suppose that D > 0. (The 
argument can be easily adapted to the case D < 0.) For i = 0, 1, 2, . . . , D, let 
R(i) be the shortest prefix of y with b(R(i)) = i. Similarly, let S(i) be the longest 
prefix of y with b(S(i)) = i. Again, consider an accepting computation of M on 
input w. For each pair S(i)], let [P(«), Q(*)] be the pair of states such that 

M is in state P(i) after reading xR(i) and M is in state Q(i) after reading xS(i). 
Since Z? > n 2 , some pair of states repeats in the sequence {[P(i), Q(i)]}. That is, 
there exists j < k such that [P(j), Q{j)} = [P{k), Q{k)}. We may therefore omit 
the portion of the computation that occurs between the end of R(j) and the 
end of R(k) as well as that which occurs between the end of S(k) and the end 
of S(j) to obtain a computation accepting a shorter word u such that r(u) = e. 
Again we have a contradiction, and our result follows. □ 



The following proposition gives a quadratic lower bound. 



Proposition 7. For each integer n > there exists a DFA M n with n+l states, 
over the alphabet £ = {1,1 -1 }, such that the only reducible word w € L(M n ), 
has length (n + l)(n — l)/2 if n is odd, and n 2 /2 if n is even. 



Proof. The proof is constructive. Let M n be the DFA (Q, S, S, qo, F), illustrated 
in Figure [5l where Q = {q~x} U {qi : < i < n}, F — {qy™ j}, and 

{q a +i, if cither c = 1 and < a < [f J , 
or c = l" 1 and [_f J < a < n - 2; 
q a mod 2, if c = 1 _1 and a = n - 1. 

Any other transitions lead to the dead state, q-\. 

Observe that if n is odd, then each word w accepted by M n has the form 
w = (1~1 r) lT _ ] for an a > 0. Computing the "balance" function b 
introduced in the proof of Proposition [6j we get b(w) — ^(n — 1 — 2a), which is 
if and only if a = Finally, by computing the length of w for such an a, 
we obtain (n+ 1)(« — l)/2- 

Similarly, in the case of n even, we can prove that the only reducible word 
accepted by M n has length n 2 /2. □ 
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Fig. 5. Top: M n where n is odd. Bottom: M n where n is even. 



5 Counterexamples 

Here we look at counterexamples that solve two natural questions. The first 
question is whether the set of all equivalent words to those in a regular language 
must be regular. The second question is whether any of our results hold for a 
more generalized form of the reduced representation. We now define the set of 
equivalent words. 

Definition 2. For w £ S* , the set of equivalent words is 

eq(w) = {w' : r(w) = r(w')} 

For L C U* , the set of equivalent words is 

eq(L) = {eq(w) : w 6 L} 

Proposition 8. There exists L C S* such that L is regular but eq(L) is not. 

Proof. Consider the following example. Let E = {1, l -1 } and let L = {e}. 
Then eq(L) is the language of balanced parentheses, which is well known to be 
nonregular. □ 

Proposition 9. Let L be a regular language over an alphabet S. The language 
eq(L) is context-free. 

Proof. Let M be a DFA accepting L. We first construct the e-NFA M r of Propo- 
sition [T] that accepts r{L). We then reverse the transitions of M r to obtain an 
e-NFA A that accepts the reversal of r(L). We now construct a PDA B that 
accepts eq(L). The operation of B is as follows. On input w, the PDA B reads 
each symbol of w and compares it with the symbol on top of the stack. If the 
symbol being read is a and the symbol on top of the stack is the inverse of a, the 
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machine B pops the top symbol of the stack. Otherwise, the machine B pushes 
a on top of the stack. After the input w is consumed, the stack contains a word 
z that is equivalent to w. Moreover, since z does not contain any factor of the 
form aa _1 , the word z must equal r(w) by Lemma [TJ Finally, on e-transitions, 
the PDA B pops each symbol of z off the stack and simulates the computation 
of A on each popped symbol. The net effect is to simulate A on z R (the reversal 
of z). If z R is accepted by A, the PDA B accepts w. Since z R is accepted by 
A if and only if z € r(L), the PDA B accepts w if and only if r(w) e r(L). 
However, we have r(w) G r(L) if and only if w € eq(L), so B accepts eq(L), as 
required. □ 

A set of equivalent words as described above can be thought of as an equiv- 
alence class under an equivalence relation described by a very particular set of 
equations: for all n£X, act -1 = e. Our generalization is to allow for an arbitrary 
set of equations, which we will refer to as the "defining set of equations" . Then 
a generalized reduced representation of a word is an equivalent word such that 
there are no shorter equivalent words. It is no longer necessary that reduced 
representation be unique, so we denote the set of generalized reduced represen- 
tations of a word w as r g (w). For example, if £ = {a, b, c, d} we may have the 
defining set of equations {ab = cd, be = a}. Then eq(afrd) = {abd, edd, bebd} and 
r g {aabd) = {abd,cdd\. It is natural to wonder whether an analogous result to 
Corollary Q] holds under this generalized form of reduced representations. 

Proposition 10. There exists L C S* such that L is regular but r g (L) is not 
even context-free. 

Proof. Consider the following example. Let E = {a,b,c}, let the defining set of 
equations be {ab = ba, ac — ca, be — cb} and let L = (abc)* . Then r g (L) is the 
language {x : \x\ a — \x\t, — \x\ c }, which is known to be noncontext-free, and 
hence also nonregular. □ 
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